27 research outputs found

    ETL Pipeline Resource Predictions in Distributed Data Warehouses

    Get PDF
    Data warehouses of large corporations are increasing in size. Many companies have adopted a distributed data warehouse system which may store data on many machines. Every day, millions of ETL jobs send data to those warehouses, but some jobs fail due to lack of resources and need to be restarted. Predicting ETL resource demands in distributed data warehouse systems is crucial for efficient use of resources and improved ETL pipeline tasks execution performance. The subject of resource-demand predictions for the ETL data pipeline has not yet been discussed in the literature. This paper discusses a method of predicting resource demands based on history. The linear regression function y = k x +b is used to predict memory, as well as disk usage, thus enabling improvement of accuracy of resource usage and the performance of ETL pipeline tasks execution

    Improving the Performance of SQL Join Operation in the Distributed Enterprise Information System by Caching

    Get PDF
    The enterprise information system (EIS) contains databases and other data sources in multiple data centers. Users query the EIS via clients. The client has a working space in the cloud. Caching data in client space will reduce the total execution time of the query. However, the client space has limited resources to store data. There are two options for caching data at the client space: caching the final results of query operations, or caching the source data tables. The problem is that some query operations such as “joining multiple big tables” will simply produce a result too big to store in cache in some cases. By contrast, caching source data tables may be a better choice in those situations. This paper presents an algorithm that combines active caching and passive caching to improve the cache hit, thus improving performance of the SQL join query in the cloud computing environment

    Improving the Data Warehouse Architecture Using Design Patterns

    Get PDF
    Data warehousing is an important part of the enterprise information system. Business intelligence (BI) relies on data warehouses to improve business performance. Data quality plays a key role in BI. Source data is extracted, transformed, and loaded (ETL) into the data warehouses periodically. The ETL operations have the most crucial impact on the data quality of the data warehouse. ETL-related data warehouse architectures including structure-oriented layer architectures and enterprise-view data mart architecture were studied in the literature. Existing architectures have the layer and data mart components but do not make use of design patterns; thus, those approaches are inefficient and pose potential problems. This paper relays how to use design patterns to improve data warehouse architectures

    How to Secure the Cloud based Enterprise Information System -A Case Study on Security Education as the Critical Foundation for a MS-EIS Program

    No full text
    Abstract. This paper presents a case study for a new Master of Science in Enterprise Information Systems program created at Colorado Technical University in which security courses occupy over 20% of all classes within the program. Should there be such a high emphasis on security courses? Through reviewing the performance of the first class of students in the Enterprise Information System Capstone course of this program, we can conclude that the investment on the security education is absolutely necessary. These courses have laid down the critical foundation for students to correctly handle today's ever growing real world Enterprise Information Systems' challenges

    Real Time Virtual Laboratory Solution Prototype and Evaluation for Online Engineering Degree Programs

    No full text
    One of the challenges of online engineering education is to provide students with hands-on laboratory experiences that require being in an on-campus laboratory. Virtual laboratory technical solutions have been developed over the last decade to allow learners to simulate engineering systems online or to connect to predesigned system modules within physical laboratories. However, these predesigned solutions must be acquired as software and hardware components that require a certain budget and training time before they can be used. In addition, these solutions do not concentrate on the construction portion of the systems under experimentation, but rather on testing the predesigned module using a virtual application that connects to it. In this paper, we developed a solution that enables online learners to build virtual systems, step-by-step, at their ends and connect them to real-time on-campus labs to perform remote experimentations with logic gates systems. We used a combination of technologies, such as Virtual Networking Computing (VNC) technology, Video Conferencing (VC) technology, and Object Oriented Programming (OOP). Our solution was practically proven using Python programming running on a Raspberry PI system to construct sample examples of virtual logic gates applications. This allows online students to concentrate on constructing, step-by-step, logic gate systems remotely, and to control actual physical logic gate systems within an on-campus lab with the help of a webcam. Our solution was tested by a group of learners and was proven to be a cost-effective alternative to traditional laboratory experiences

    Enhancing the robustness of recommender systems against spammers.

    No full text
    The accuracy and diversity of recommendation algorithms have always been the research hotspot of recommender systems. A good recommender system should not only have high accuracy and diversity, but also have adequate robustness against spammer attacks. However, the issue of recommendation robustness has received relatively little attention in the literature. In this paper, we systematically study the influences of different spammer behaviors on the recommendation results in various recommendation algorithms. We further propose an improved algorithm by incorporating the inner-similarity of user's purchased items in the classic KNN approach. The new algorithm effectively enhances the robustness against spammer attacks and thus outperforms traditional algorithms in recommendation accuracy and diversity when spammers exist in the online commercial systems
    corecore